Maximum entropy models provide functional connectivity estimates in neural networks

Tools to estimate brain connectivity offer the potential to enhance our understanding of brain functioning. The behavior of neuronal networks, including functional connectivity and induced connectivity changes by external stimuli, can be studied using models of cultured neurons. Cultured neurons tend to be active in groups, and pairs of neurons are said to be functionally connected when their firing patterns show significant synchronicity. Methods to infer functional connections are often based on pair-wise cross-correlation between activity patterns of (small groups of) neurons. However, these methods are not very sensitive to detect inhibitory connections, and they were not designed for use during stimulation. Maximum Entropy (MaxEnt) models may provide a conceptually different method to infer functional connectivity. They have the potential benefit to estimate functional connectivity during stimulation, and to infer excitatory as well as inhibitory connections. MaxEnt models do not involve pairwise comparison, but aim to capture probability distributions of sets of neurons that are synchronously active in discrete time bins. We used electrophysiological recordings from in vitro neuronal cultures on micro electrode arrays to investigate the ability of MaxEnt models to infer functional connectivity. Connectivity estimates provided by MaxEnt models correlated well with those obtained by conditional firing probabilities (CFP), an established cross-correlation based method. In addition, stimulus-induced connectivity changes were detected by MaxEnt models, and were of the same magnitude as those detected by CFP. Thus, MaxEnt models provide a potentially powerful new tool to study functional connectivity in neuronal networks.

which is essentially a simplified electrical circuit model of the neural activity. There are three main terms. The term − 1 τ V i + Θ i implicitly assumes that ion channel conductances are constants; the time constant τ relates to the membrane resistance and capacitance, while Θ i depends on the equilibrium potentials and conductances associated with the various ion channels. The term I i (t) allows for current injection to affect the membrane potential; we ignore this term and set it to 0. Finally, the term j J ij a g(t − t (a) j ) allows the firing of neuron j at a time of t (a) j to increase or decrease the rate of change of the membrane potential of neuron i by J ij g(t − t (a) j ), where J ij is a "synaptic connectivity" and g is any function, usually chosen to be exponential. More generally, the hard threshold is replaced by a "soft threshold" in which the probability of a neuron firing in a time bin of size ∆t is some function of the current membrane potential. That is, neuron i fires with rate f (V i − V th ) for some (increasing) function f . This formulation allows for a neuron to fire even when it is below threshold. This is not necessarily biophysically reasonable, but it provides a better statistical match to Maximum Entropy models. The soft threshold leads to a hard threshold when Our next goal is to get statistics of P (σ i = 1) and P (σ j = 1, σ i = 1) for the dynamical model, from which we can infer a relationship between dynamical parameters and MaxEnt parameters. Assuming stationarity, we have We can think of this in the following way: there is some mean-field activity; neurons are firing, producing spikes of neuron i with some frequency. In general, it is not permissible to set ⟨f higher-order terms might be non-negligible. Next, we have to calculate P (σ j = 1, σ i = 1). We can case this out into two cases, one in which neuron i fires first and one in which neuron j fires first. Roughly speaking, the weights of this happening correspond to the mean-field probabilities of them firing independently. The weighting factor is ⟨f . After some algebra, we find that When we have a hard threshold, then ⟨f ′ ⟩ is simply the value of the probability density function at the threshold voltage, or the firing rate. Hence, in that case, This development mirrors that of Ref. 1 , but differs in that we are assuming that the time bin size ∆t for the MaxEnt method is quite small. As such, we end up finding a direct relationship between the two connectivities. Our Eq. S4 explain the previously unexplained correspondence betweenĴ ij and J ij + J ji shown in the Appendix of Ref. 1 . In practice, we have found it difficult to find a regime such that the conditions of Eq. S4 hold.

Influence of inactive electrodes on MaxEnt connectivity estimation
We calculate MaxEnt connectivity based on all electrodes (J all ), and based on active electrodes only (J active ), and compare results for connections between active electrodes. Figure S1 is a typical example, showing a relatively high correlation coefficient between J all and J active . All 20 analysed samples yielded similar results, with an average correlation coefficient of 0.87 ± 0.02. A closer analysis reveals that there is a strong correlation between the excitatory connections in both approaches, shown in blue in Figure  S1 (R = 0.86 ± 0.02), and also a strong correlation between the inhibitory connections, shown in green (R = 0.87 ± 0.02). This resulted in a Euclidean distance between J all and J active of ≈ 8.78 ± 1.24, which is in line with distances between subsequent connectivity matrices obtained from continuous spontaneous recordings which were ≈ 22.52 ± 1.95 (see figure on spontaneous connectivity changes in main text).

Theoretical relationship between CFP and MaxEnt functional connectivity
We can relate the two statistical models. The key is to compute P (σ j (t + τ ) = 1|σ i (t) = 1), or CF P ij (τ ), assuming that the MaxEnt model is correct, and thereby relate the CFP connectivity matrix M to the MaxEnt connectivity matrix J. Our assumptions are that we are in a small-coupling limit, i.e. that the connections are weak relative to the innate propensity of each neuron to fire. Under the CFP model, the probability that two neurons i and j end up spiking in the same time bin comes from two separate integrals that ignore the contributions of other neurons (or simply assumes that those contributions wash out).
We start by relating the probability of two neurons firing in the same time bin to the probability density of those two neurons firing: Either neuron i fires first, or neuron j fires first: The expression for CF P ij (t) gives Here M i,j , T i,j , o i,j and w i,j are obtained from CFP. M i,j is interpreted as the strength of the connection, T i,j as the latency. o i,j represents uncorrelated background activity and w i,j accounts for the width of the peak. We have assumed that in the absence of neuron j firing, the probability of neuron i firing in a small time bin is where we have assumed that Here the firing rate λ is taken to be the mean firing rate over time, which includes the responses to other neurons in addition to neuron i's baseline firing rate. This same probability in the small coupling limit (J ij ≪ θ i , θ j ) in the MaxEnt model is derived as follows. Let J be the average value of J ij . First, we approximate the partition function in the small coupling limit as which yields and (S14) Now, we match these two models to get and which solves as ignoring corrections of O(∆t). Note that we can only make a statement about J ij + J ji . When it is safe to assume that T ij ≪ w ij , Hence, J ij + J ji directly reflects M ij , M ji , normalized by various other parameters related to the firing patterns and shifted by a factor that depends on the mean J ij and the mean firing rate. Note also that while θ i has a strong dependence on ∆t, J ij does not depend on the size of time bins to first order in ∆t.
If one does not assume that ∆t is small, additional terms can be added, yielding corrections of the form ignoring corrections of O(∆t 2 ). This introduces another culture-dependent shift in the zeroth-order relationship between estimated MaxEnt functional connectivities and true MaxEnt functional connectivities from Eq. S17. The timescale ∆t must be small enough so that Jij +Jji min(|θi|,|θj |) is small and so that the additional correction term shown above is typically small.
Computational load for MaxEnt and CFP scaled with the number of active electrodes and with the total number of recorded spikes. We applied MaxEnt and CFP to 1h recordings of spontaneous activity using HP Elite Desk 800 G5, processor Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz 3.00 GHz, RAM 64,0 GB, 64-bit operating system. Figure S2 shows that CFP computational load increased exponentially with increasing number of active electrodes or total number of spikes, while MaxEnt computational time scaled with the number of data points.

An analysis of the goodness of fit of Minimum Probability Flow
When finding parameters of our Maximum Entropy model, we are faced with the task of minimizing some sort of distance between model and data. In this case, we have is the partition function. Typically, practitioners take a maximum likelihood approach, in which we minimize the Kullback-Leibler divergence between model and data: where H[p data ] is the entropy of the data distribution and log L is the log likelihood. This well-known relationship between Kullback-Leibler divergence and log likelihood implies that minimizing an informationtheoretic distance between the model and data distributions is equivalent to choosing parameters by maximizing the (log) likelihood. Note that we have no control over the entropy of the data distribution, and so the H[p data ] term is irrelevant to our parameter estimation procedure. If we now proceed further expand the log likelihood, we find where the ⟨·⟩ data indicate that an average is taken with respect to the data distribution. If we search for parameters J, θ that maximize the log likelihood, we look for parameters such that the gradient of the log likelihood is 0: where some straightforward manipulation has been omitted. It seems that we have a straightforward procedure for finding J, θ, in which we try to maximize log L via some sort of gradient ascent using the gradients calculated above. But the averages with respect to the model distribution are fraught with difficulty, since we must (naively) sum over all possible ⃗ x. This can be prohibitively expensive. For example, suppose that there are 60 neurons-then there are 2 60 possible binary vectors.
The typical way around this problem is to use contrastive divergence 2 , in which some form of Monte Carlo is used to approximate the model averages ⟨⃗ x⟩ model , ⟨⃗ x⃗ x ⊤ ⟩ model . But about a decade ago, an alternative approach was discovered. Instead of trying to minimize the Kullback-Leibler divergence between data and model distributions, in Minimum Probability Flow (MPF), we imagine that there is some dynamics that takes the data distribution to the model distribution and try to minimize where p t is the transformed data distribution a time t later. This gives the following objective function For small t this reduces to an objective function of Then, following the derivation by Sohl-Dickstein and colleagues (Eqs. A1 -A9 of Appendix A) 3 , we obtained where D is the set of binary vectors in the data, E(⃗ x) = ⃗ x ⊤ θ + ⃗ x ⊤ J⃗ x and g ij is an arbitrary symmetric connectivity factor. This is the probability flow that we try to minimize. An appropriate choice of g ij yields a highly tractable objective function, and regardless of the choice of g ij , this objective function is convex. (We used g ij = 1 only when i and j were one bit flip away.) It was shown in Refs. 3 that when p data could be fit exactly by a Maximum Entropy model, MPF would find that Maximum Entropy model very quickly-more quickly than contrastive divergence, for example. We first ask whether or not MPF will do a good job of fitting a data distribution that is likely not exactly a Maximum Entropy model. We simulated a small number of simulated neurons (small enough that contrastive divergence is not required) and suppose that p data (⃗ x) is chosen from a Dirichlet distribution. Then, there is no real structure to speak of in p data (⃗ x). It is still a well-posed question to find the best possible Maximum Entropy model. But now, MPF and maximum likelihood might yield different answers. In addition to log likelihood, we monitor the total variational distance T V D = ⃗ x |p data (⃗ x) − p model (⃗ x)| between the model and data distributions. We find that, with enough data, they yield almost indistinguishable answers in terms of goodness of fit. Fig. S3 shows an example run. However, closer inspection reveals that the model and data can match without the inferred connectivity matching the true connectivity well. We simulated a neural system that behaves according to a known MaxEnt model, and then inferred parameters J and θ using MPF and MLE. The ground truth J's and θ's did not match the inferred J's and θ's, even though the first-order, second-order, and third-order moments all matched. This is a typical property of a sloppy model 4 . This implies that we cannot completely trust the J's and θ's inferred by either MLE or MPF. It is therefore somewhat of a surprise that the results in the main text validate the match between CFP and inferred MaxEnt functional connectivities.
Finally, unexpectedly, MPF does not seem to be correctly fitting the MaxEnt model on the real neural data, in that the moments of the MPF-fitted MaxEnt model do not match the first and secondorder moments of the experimental data. We checked this by using a Markov Chain Monte Carlo sampler (validated first on a subset of twenty neurons) on the 60-neuron MPF-fitted MaxEnt model and comparing the calculated first and second-order moments to the first and second-order moments of the experimental data. There was poor agreement as shown in Fig. S4. This could be a product of the sampler used, as proposal distribution can greatly affect results.
The last two facts combined lead us to a surprising conclusion. The MPF-inferred MaxEnt functional connectivities do not match the functional connectivities that we would infer if we demanded exact matches with the first and second-order moments. But the results in the main text suggest that the MPF-inferred MaxEnt functional connectivities are still correlated with the functional connectivities that we would infer if perfect matches with first and second-order moments were demanded. We can make no guarantees about functional connectivities inferred using contrastive divergence or other methods.  Figure S4: A scatterplot between the predicted means ⟨x i ⟩ and correlations ⟨x i x j ⟩ − ⟨x i ⟩⟨x j ⟩ and the means and correlations from the data for one representative experiment. The data show a wide range of means and correlations, but the predicted means and correlations tend to be 1 and 0, respectively.